Harden Photos sync output path handling#220
Harden Photos sync output path handling#220MrJarnould wants to merge 14 commits intotimlaing:mainfrom
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 38 minutes and 51 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughReplaces the monolithic Photos implementation with a modular CloudKit-backed and legacy photos service, adds a typed CloudKit client, a full photo sync engine with state persistence and materialisation, expands CLI photo commands (libraries/get/changes/sync/sync-cursor/watch), and adds extensive tests and fixtures. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant PhotosService
participant CloudKitClient
participant PhotoSyncState
participant Filesystem
User->>CLI: photos sync --library root
CLI->>PhotosService: sync(options)
PhotosService->>PhotoSyncState: open state DB / get cursor
PhotosService->>CloudKitClient: database_changes(since=cursor)
CloudKitClient-->>PhotosService: zone/asset records
loop for each asset
PhotosService->>PhotoSyncState: check resource manifest
alt resource not current
PhotosService->>CloudKitClient: download_asset_bytes(url)
CloudKitClient-->>PhotosService: bytes
PhotosService->>Filesystem: write file (atomic)
PhotosService->>PhotosService: apply EXIF/XMP/RAW policies
PhotosService->>PhotoSyncState: upsert resource
else
PhotosService-->>PhotoSyncState: record skip
end
end
PhotosService->>PhotoSyncState: set_sync_cursor(new_cursor)
PhotosService-->>CLI: return sync result
CLI-->>User: display summary / JSON
sequenceDiagram
participant User
participant CLI as CLI<br/>photos watch
participant Service as PhotosService
participant Sync as run_photo_sync
participant Output as UI
User->>CLI: photos watch --iterations 3 --interval 30
CLI->>Service: watch(options, interval_seconds=30, iterations=3)
loop i from 1 to 3
Service->>Sync: run_photo_sync(options)
Sync-->>Service: PhotoSyncResult
Service->>Output: print iteration status and results
alt not last iteration
Service->>Service: sleep(30)
end
end
Service-->>CLI: finished
CLI-->>User: exit
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 15
🧹 Nitpick comments (1)
tests/fixtures/photos_album_create_response.json (1)
1-16: LGTM! Valid CloudKit album creation response fixture.The JSON structure is well-formed and correctly represents a CloudKit album creation response. The base64-encoded album name and response format are appropriate for testing the album creation flow.
Given the PR's focus on path-safety hardening, you may want to ensure that your test suite also includes fixtures with edge cases such as album names containing path traversal sequences (
../), special filesystem characters, or unusually long names—though these may already be covered in other test files.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/fixtures/photos_album_create_response.json` around lines 1 - 16, Add additional test fixtures alongside photos_album_create_response.json covering edge-case album names: create variants where the fields.albumNameEnc value is base64 for path traversal sequences ("../"), names with special filesystem characters (e.g., ":" "\" "*" "?" "<" ">" "|"), and an unusually long name (e.g., >255 chars) to validate path-safety hardening. Ensure each fixture keeps the same CloudKit structure (records[*].recordName, recordChangeTag, fields.albumNameEnc, fields.isDeleted) so tests can reuse parsing logic that reads albumNameEnc and confirm sanitization or rejection behavior in the code that handles album creation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples.py`:
- Around line 451-453: The code constructing album_name uses deprecated
datetime.utcnow(); update the expression to use a timezone-aware call by
replacing datetime.utcnow().strftime(...) with
datetime.now(timezone.utc).strftime(...) (ensure timezone is imported) so
album_name generation (the f-string around datetime/... and uuid4().hex[:8])
remains unchanged but uses datetime.now(timezone.utc) instead of the deprecated
utcnow.
In `@pyicloud/cli/commands/photos.py`:
- Around line 944-976: The CLI prints the "Starting photo watch" message before
the iterator's internal sleep because the sleep happens when
next(watch_payloads) resumes; to fix, either (A) move the wait/sleep
orchestration out of the iterator and into this loop so you call
_print_photo_watch_wait() then sleep and only then call next(watch_payloads) and
_print_photo_watch_start(), or (B) change the iterator that produces
watch_payloads (watch_photo_sync) to yield a payload immediately and perform its
sleep after yielding (i.e., yield before sleeping) so the CLI's
_print_photo_watch_start() reflects actual start timing; update references in
this file to use the chosen approach and ensure _print_photo_watch_wait,
_print_photo_watch_start, and next(watch_payloads) are called in the corrected
order.
In `@pyicloud/common/cloudkit/client.py`:
- Around line 131-137: The asset GET error path currently maps 429 responses to
a generic CloudKitApiError; change it to detect HTTP 429 and raise
CloudKitRateLimited (same as post()) so callers can back off, extracting the
Retry-After header from resp.headers (e.g., retry_after =
resp.headers.get("Retry-After")) and include it on the CloudKitRateLimited
exception while leaving 401/403 to raise CloudKitAuthError and other >=400
responses to raise CloudKitApiError (preserve existing payload extraction via
getattr(resp, "text", None)); update the logic near the code that currently
raises CloudKitAuthError/CloudKitApiError for asset GETs to implement this 429
handling.
In `@pyicloud/services/photos_cloudkit/client.py`:
- Around line 141-147: Wrap the model validation calls in batch_count and
upload_file so validation errors are converted to CloudKitApiError: in
batch_count, move PhotosBatchCountResponse.model_validate(raw_data) into the try
block that currently catches the extraction error, catch Exception as exc, and
raise CloudKitApiError("Photos count query failed", payload=raw_data) from exc
(don't reference data since it may not exist); likewise in upload_file wrap the
relevant Response model_validate call in a try/except and raise CloudKitApiError
with payload=raw_data when validation fails, preserving the existing error
message patterns and using CloudKitApiError as the public boundary.
In `@pyicloud/services/photos_cloudkit/mappers.py`:
- Around line 28-40: The current branch encodes value with value.encode("ascii")
outside the try block causing UnicodeEncodeError for already-decoded non-ASCII
strings; change the str branch to set raw = value (keep bytes branch as-is) and
move the ASCII encoding into the guarded block so the code attempts
base64.b64decode(raw.encode("ascii")).decode("utf-8") inside the try, on failure
return raw (the original string) in the inner fallback, and only log via
LOGGER.debug(record_name(record), field_name) and return None as the final
fallback; reference symbols: value, raw, base64.b64decode, LOGGER.debug,
record_name, field_name, record.
In `@pyicloud/services/photos_cloudkit/materialize.py`:
- Around line 411-432: _jpeg_has_exif_datetime currently forces little-endian
when reading the first IFD offset which breaks big-endian (MM) EXIF; change the
code to detect the TIFF byte order from the EXIF payload (check
exif_payload[0:2] == b"MM" for big-endian vs b"II" for little-endian) and pass
the correct byte order to _read_uint32 (instead of the hardcoded b"<"), so the
initial call that computes the first IFD offset in _jpeg_has_exif_datetime uses
the detected byte order before calling _parse_tiff_ifd and subsequent tag
readers.
In `@pyicloud/services/photos_cloudkit/service.py`:
- Around line 1283-1286: The fullname method may KeyError or recurse forever for
missing or cyclic parents; update Album.fullname to defensively resolve parent
chain: iterate up using self._parent_id and lookups on self._library.albums (use
a safe lookup that does not raise, e.g., AlbumContainer.get or try/except around
AlbumContainer.__getitem__), build path segments iteratively while tracking
visited parent ids to detect cycles, and stop/return a sensible fallback (e.g.,
the current name or a partial path) when a parent is missing or a cycle is
detected instead of recursing. Ensure you reference and update the fullname
method on Album and use the _parent_id and self._library.albums symbols for
locating the lookup and cycle detection.
- Around line 1477-1515: The legacy branch in _get_len blindly indexes
response["batch"][0]["records"][0]["fields"]["itemCount"]["value"] and can raise
IndexError/KeyError; update _get_len to defensively parse the JSON from the
POST: verify response is a dict, that "batch" exists and is non-empty, that
batch[0] contains "records" with at least one entry, and that that record has
"fields"→"itemCount"→"value"; if any of those checks fail or an exception
(IndexError/KeyError/TypeError/ValueError) occurs, fall back to a safe default
(e.g., return 0) or—if self._client is available—call the typed path
self._client.batch_count(container_id=self._get_container_id,
zone_id=self._zone_id); ensure the function returns an int and does not let raw
indexing errors propagate.
In `@pyicloud/services/photos_cloudkit/sync.py`:
- Around line 203-215: The current logic allows assets to be marked
asset_ready_for_delete while --until-found is enabled, which can skip resources
and prematurely delete remote assets; modify the delete-determination code (the
block that sets/uses asset_ready_for_delete in sync.py, including the loop/logic
referenced around the later 308-385 range) so that when options.until_found is
not None you never mark an asset as ready for remote deletion unless all of its
resources have been explicitly confirmed local (or simply disable
auto-delete/keep_icloud_recent_days behavior when options.until_found is set).
Specifically, guard the branch that sets asset_ready_for_delete with a check
against options.until_found (or add a flag that requires full-resource
confirmation) so remote deletion paths are skipped if until_found could have
skipped resources.
- Around line 217-229: The code currently always creates the output directory
via options.directory.mkdir(...) even in preview modes; change it to only create
the directory for real sync runs by wrapping the mkdir call in a condition that
checks not (options.dry_run or options.only_print_filenames). Keep the rest
intact (PhotoSyncResult construction, create_photo_sync_state call and its
ephemeral logic) so only the directory creation is gated while other behavior
remains the same.
- Around line 511-517: The loop over state.iter_resources() short-circuits based
only on path existence; update it to also compare the tracked/stored size from
the resource entry against the actual filesystem size before deciding the
resource is current. In the block that uses _safe_target_path(directory,
entry.relative_path) and checks path.exists(), fetch the expected size from the
entry (e.g., entry.size or the resource size field on entry) and compare it to
path.stat().st_size (or equivalent); if sizes differ, return False so
truncated/replaced files don't incorrectly pass. Ensure PhotosServiceException
handling remains the same.
In `@pyicloud/services/photos_legacy.py`:
- Around line 724-733: The libraries property makes a POST to
f"{self.service_endpoint}/changes/database" but omits self.params; update the
session.post call inside BasePhotosService.libraries (the block referencing
self.service_endpoint, session.post, data="{}", headers={CONTENT_TYPE:
CONTENT_TYPE_TEXT}) to pass the CloudKit request params by adding
params=self.params so the request includes required account/request flags for
proper library discovery.
In `@tests/fixtures/photos_browser_mutations/README.md`:
- Around line 8-10: In the README sentence "Stable placeholder values are used
instead so request and response relationships remain testable without exposing
personal data.", add a comma after "instead" to read "Stable placeholder values
are used instead, so request and response relationships remain testable without
exposing personal data." Edit the sentence in the README entry that contains the
phrase "Stable placeholder values are used instead so..." to insert the comma
for improved readability.
In `@tests/services/test_photos_sync.py`:
- Around line 784-795: The test currently passes added_date=None which
DummyAsset coerces back to asset_date; instead, after constructing recent_asset
and old_asset (created via DummyAsset), explicitly delete their added_date
attributes (e.g., del recent_asset.added_date and del old_asset.added_date) so
the code under test must fall back to asset_date; update references to
recent_asset and old_asset accordingly to ensure the filter exercises the
fallback path.
- Around line 680-704: The test currently relies on datetime.now() via
DummyAsset which makes the "2026:" assertion flaky; fix by creating a fixed
asset_date (e.g., datetime(2024, 6, 1, ...)) and pass it into DummyAsset so the
generated EXIF is deterministic, then update the assertions to expect the chosen
year string (e.g., "2024:") instead of "2026:". Locate the test where
DummyAsset("asset-exif", "photo.jpg", payloads=...) is created and modify it to
accept the fixed asset_date, keeping run_photo_sync and PhotoSyncOptions usage
unchanged.
---
Nitpick comments:
In `@tests/fixtures/photos_album_create_response.json`:
- Around line 1-16: Add additional test fixtures alongside
photos_album_create_response.json covering edge-case album names: create
variants where the fields.albumNameEnc value is base64 for path traversal
sequences ("../"), names with special filesystem characters (e.g., ":" "\" "*"
"?" "<" ">" "|"), and an unusually long name (e.g., >255 chars) to validate
path-safety hardening. Ensure each fixture keeps the same CloudKit structure
(records[*].recordName, recordChangeTag, fields.albumNameEnc, fields.isDeleted)
so tests can reuse parsing logic that reads albumNameEnc and confirm
sanitization or rejection behavior in the code that handles album creation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: be4b8642-c2ff-4018-8f3d-116c45b7f62b
📒 Files selected for processing (66)
README.mdexamples.pypyicloud/cli/commands/photos.pypyicloud/cli/normalize.pypyicloud/common/cloudkit/__init__.pypyicloud/common/cloudkit/client.pypyicloud/common/cloudkit/models.pypyicloud/services/photos.pypyicloud/services/photos_cloudkit/__init__.pypyicloud/services/photos_cloudkit/client.pypyicloud/services/photos_cloudkit/constants.pypyicloud/services/photos_cloudkit/mappers.pypyicloud/services/photos_cloudkit/materialize.pypyicloud/services/photos_cloudkit/models.pypyicloud/services/photos_cloudkit/queries.pypyicloud/services/photos_cloudkit/service.pypyicloud/services/photos_cloudkit/state.pypyicloud/services/photos_cloudkit/sync.pypyicloud/services/photos_legacy.pytests/fixtures/README.mdtests/fixtures/photos_album_create_response.jsontests/fixtures/photos_album_membership_query_core.jsontests/fixtures/photos_album_membership_response.jsontests/fixtures/photos_album_rename_response.jsontests/fixtures/photos_all_photos_query_core.jsontests/fixtures/photos_all_photos_response.jsontests/fixtures/photos_browser_mutations/README.mdtests/fixtures/photos_browser_mutations/album_add_photo_request.jsontests/fixtures/photos_browser_mutations/album_add_photo_response.jsontests/fixtures/photos_browser_mutations/album_create_request.jsontests/fixtures/photos_browser_mutations/album_create_response.jsontests/fixtures/photos_browser_mutations/album_delete_request.jsontests/fixtures/photos_browser_mutations/album_delete_response.jsontests/fixtures/photos_browser_mutations/album_remove_photo_request.jsontests/fixtures/photos_browser_mutations/album_remove_photo_response.jsontests/fixtures/photos_browser_mutations/album_rename_request.jsontests/fixtures/photos_browser_mutations/album_rename_response.jsontests/fixtures/photos_browser_mutations/photo_delete_request.jsontests/fixtures/photos_browser_mutations/photo_delete_response.jsontests/fixtures/photos_database_changes_response.jsontests/fixtures/photos_favorites_query_core.jsontests/fixtures/photos_favorites_response.jsontests/fixtures/photos_indexing_not_finished_response.jsontests/fixtures/photos_live_photo_response.jsontests/fixtures/photos_missing_counterparts_response.jsontests/fixtures/photos_recently_added_query_core.jsontests/fixtures/photos_recently_added_response.jsontests/fixtures/photos_shared_library_all_photos_query_core.jsontests/fixtures/photos_shared_library_all_photos_response.jsontests/fixtures/photos_shared_library_favorites_query_core.jsontests/fixtures/photos_shared_library_favorites_response.jsontests/fixtures/photos_shared_library_private_zones_response.jsontests/fixtures/photos_shared_library_shared_zones_response.jsontests/fixtures/photos_shared_library_unfavorite_request.jsontests/fixtures/photos_shared_library_unfavorite_response.jsontests/fixtures/photos_shared_library_zone_changes_request.jsontests/fixtures/photos_shared_library_zone_changes_response.jsontests/fixtures/photos_upload_duplicate_response.jsontests/fixtures/photos_upload_skeletal_response.jsontests/fixtures/photos_video_only_response.jsontests/fixtures/photos_zone_changes_response.jsontests/fixtures/photos_zones_list_response.jsontests/services/test_photos.pytests/services/test_photos_cloudkit_client.pytests/services/test_photos_sync.pytests/test_cmdline.py
| data = PhotosBatchCountResponse.model_validate(raw_data) | ||
| try: | ||
| return data.batch[0].records[0].fields.itemCount.value | ||
| except Exception as exc: | ||
| raise CloudKitApiError( | ||
| "Photos count query failed", payload=data.model_dump(mode="json") | ||
| ) from exc |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Inspect response model validation sites and malformed-payload test coverage.
rg -nP -C3 'PhotosBatchCountResponse\.model_validate|PhotosUploadResponse\.model_validate|malformed|invalid JSON|CloudKitApiError'Repository: timlaing/pyicloud
Length of output: 23106
🏁 Script executed:
cat -n pyicloud/services/photos_cloudkit/client.py | sed -n '135,150p'Repository: timlaing/pyicloud
Length of output: 797
🏁 Script executed:
cat -n pyicloud/services/photos_cloudkit/client.py | sed -n '180,195p'Repository: timlaing/pyicloud
Length of output: 619
🏁 Script executed:
# Check if there are tests specifically for validation errors (not extraction errors)
rg -n 'def test.*upload.*malformed|def test.*validation|ValidationError' tests/services/test_photos_cloudkit_client.pyRepository: timlaing/pyicloud
Length of output: 43
🏁 Script executed:
# Check what test_upload_file_raises_cloudkit_error_for_invalid_json actually tests
rg -A 15 'def test_upload_file_raises_cloudkit_error_for_invalid_json' tests/services/test_photos_cloudkit_client.pyRepository: timlaing/pyicloud
Length of output: 711
🏁 Script executed:
# Check the full _response_json implementation
cat -n pyicloud/services/photos_cloudkit/client.py | sed -n '149,169p'Repository: timlaing/pyicloud
Length of output: 937
🏁 Script executed:
# Look for tests that might test schema validation failures (not JSON parsing failures)
rg -B 2 -A 10 'test.*batch_count.*malformed' tests/services/test_photos_cloudkit_client.pyRepository: timlaing/pyicloud
Length of output: 481
🏁 Script executed:
# Check if PhotosBatchCountResponse and PhotosUploadResponse are Pydantic models
rg -n 'class PhotosBatchCountResponse|class PhotosUploadResponse' pyicloud/services/photos_cloudkit/Repository: timlaing/pyicloud
Length of output: 327
🏁 Script executed:
# Verify that the exception would indeed be ValidationError by checking imports
rg -n 'from pydantic|import.*ValidationError' pyicloud/services/photos_cloudkit/client.pyRepository: timlaing/pyicloud
Length of output: 43
🏁 Script executed:
# Check imports at top of photos_cloudkit/client.py
head -30 pyicloud/services/photos_cloudkit/client.pyRepository: timlaing/pyicloud
Length of output: 786
🏁 Script executed:
# Check the models to confirm they are Pydantic models
rg -B 2 'class PhotosBatchCountResponse' pyicloud/services/photos_cloudkit/models.py | head -10Repository: timlaing/pyicloud
Length of output: 154
🏁 Script executed:
# Look at the broader pattern in CloudKitContainerClient to see how validation errors are typically caught
rg -B 5 'ValidationError' pyicloud/common/cloudkit/client.py | head -40Repository: timlaing/pyicloud
Length of output: 1505
Normalise model-validation failures too.
Malformed successful JSON can still escape as a validation exception here, because only HTTP/JSON parsing and the nested count extraction are wrapped. Keep the public boundary as CloudKitApiError for bad Photos payloads as well.
This follows the pattern established throughout CloudKitContainerClient, where validation errors are caught and normalised (see query, lookup, zones_list, modify, and changes methods).
The two locations requiring fixes are:
- Lines 141–147 (
batch_count): Wrapmodel_validatein the try-except block and useraw_datain the error payload (sincedatawon't exist if validation fails). - Lines 184–186 (
upload_file): Wrapmodel_validatein a try-except block to catch and normalise validation errors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pyicloud/services/photos_cloudkit/client.py` around lines 141 - 147, Wrap the
model validation calls in batch_count and upload_file so validation errors are
converted to CloudKitApiError: in batch_count, move
PhotosBatchCountResponse.model_validate(raw_data) into the try block that
currently catches the extraction error, catch Exception as exc, and raise
CloudKitApiError("Photos count query failed", payload=raw_data) from exc (don't
reference data since it may not exist); likewise in upload_file wrap the
relevant Response model_validate call in a try/except and raise CloudKitApiError
with payload=raw_data when validation fails, preserving the existing error
message patterns and using CloudKitApiError as the public boundary.
| recent_asset = DummyAsset( | ||
| "asset-recent", | ||
| "recent.jpg", | ||
| asset_date=datetime.now(timezone.utc) - timedelta(hours=1), | ||
| added_date=None, | ||
| ) | ||
| old_asset = DummyAsset( | ||
| "asset-old", | ||
| "old.jpg", | ||
| asset_date=datetime.now(timezone.utc) - timedelta(days=10), | ||
| added_date=None, | ||
| ) |
There was a problem hiding this comment.
Make this test actually remove added_date.
DummyAsset replaces added_date=None with asset_date, so this test currently exercises filtering by added_date, not the intended fallback to asset_date.
Proposed fix
old_asset = DummyAsset(
"asset-old",
"old.jpg",
asset_date=datetime.now(timezone.utc) - timedelta(days=10),
added_date=None,
)
+ recent_asset.added_date = None
+ old_asset.added_date = None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| recent_asset = DummyAsset( | |
| "asset-recent", | |
| "recent.jpg", | |
| asset_date=datetime.now(timezone.utc) - timedelta(hours=1), | |
| added_date=None, | |
| ) | |
| old_asset = DummyAsset( | |
| "asset-old", | |
| "old.jpg", | |
| asset_date=datetime.now(timezone.utc) - timedelta(days=10), | |
| added_date=None, | |
| ) | |
| recent_asset = DummyAsset( | |
| "asset-recent", | |
| "recent.jpg", | |
| asset_date=datetime.now(timezone.utc) - timedelta(hours=1), | |
| added_date=None, | |
| ) | |
| old_asset = DummyAsset( | |
| "asset-old", | |
| "old.jpg", | |
| asset_date=datetime.now(timezone.utc) - timedelta(days=10), | |
| added_date=None, | |
| ) | |
| recent_asset.added_date = None | |
| old_asset.added_date = None |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/services/test_photos_sync.py` around lines 784 - 795, The test
currently passes added_date=None which DummyAsset coerces back to asset_date;
instead, after constructing recent_asset and old_asset (created via DummyAsset),
explicitly delete their added_date attributes (e.g., del recent_asset.added_date
and del old_asset.added_date) so the code under test must fall back to
asset_date; update references to recent_asset and old_asset accordingly to
ensure the filter exercises the fallback path.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
pyicloud/services/photos_cloudkit/client.py (1)
191-193:⚠️ Potential issue | 🟡 MinorNormalise upload response validation failures.
PhotosUploadResponse.model_validate(...)can still leak a PydanticValidationErrorfor malformed successful upload payloads. Keep the Photos client boundary consistently onCloudKitApiError, asbatch_countnow does.🐛 Proposed fix
- data = PhotosUploadResponse.model_validate( - self._response_json(response, context="Photos upload") - ) + raw_data = self._response_json(response, context="Photos upload") + try: + data = PhotosUploadResponse.model_validate(raw_data) + except ValidationError as exc: + raise CloudKitApiError( + "Photos upload response validation failed", + payload=raw_data, + ) from excRun this read-only check to confirm the upload path is wrapped and covered:
#!/bin/bash # Description: Verify upload response validation handling and related tests. sed -n '177,199p' pyicloud/services/photos_cloudkit/client.py rg -nP -C3 'PhotosUploadResponse\.model_validate|except ValidationError|test_upload_file_.*malformed|ValidationError' \ pyicloud/services/photos_cloudkit/client.py tests/services/test_photos_cloudkit_client.py🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyicloud/services/photos_cloudkit/client.py` around lines 191 - 193, The PhotosUploadResponse.model_validate call in the upload path can raise a pydantic ValidationError which currently leaks out; wrap the call inside a try/except that catches pydantic.errors.ValidationError (or pydantic.ValidationError) and re-raise a CloudKitApiError with a clear message and include original error/context (use the same context string used in _response_json like "Photos upload"); update the upload handler where PhotosUploadResponse, self._response_json and variable data are used to ensure any validation failures are converted to CloudKitApiError consistent with how batch_count errors are normalized.
🧹 Nitpick comments (1)
pyicloud/services/photos_cloudkit/service.py (1)
633-669: Deadif cls is PhotoAlbumbranch — both arms are identical.After
clsis potentially reassigned toPhotoAlbumFolderat line 634, theif cls is PhotoAlbum: return cls(...)block at lines 635–652 and thereturn cls(...)at lines 653–669 pass the same keyword arguments. The conditional therefore has no behavioural effect and is pure duplication — the whole block can collapse to a singlereturn cls(...).♻️ Proposed simplification
- if cls is PhotoAlbum: - return cls( - library=self, - name=album_name, - record_id=record_name(record), - obj_type=obj_type, - list_type=list_type, - direction=direction, - query_filter=query_filter, - client=self._client, - zone_id=self.zone_id, - query_filters=typed_query_filters, - parent_id=cast(Optional[str], record_field_value(record, "parentId")), - record_change_tag=record_change_tag(record), - record_modification_date=record_field_value( - record, "recordModificationDate" - ), - ) return cls( library=self, name=album_name, ... )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyicloud/services/photos_cloudkit/service.py` around lines 633 - 669, The two branches returning cls(...) are identical and the conditional if cls is PhotoAlbum is dead; remove the duplication by deleting the conditional branch and collapse to a single return cls(...) after the cls assignment (where cls may be set to PhotoAlbumFolder when album_type is AlbumTypeEnum.FOLDER), preserving all existing keyword arguments (library, name, record_id via record_name(record), obj_type, list_type, direction, query_filter, client=self._client, zone_id=self.zone_id, query_filters=typed_query_filters, parent_id=cast(Optional[str], record_field_value(record, "parentId")), record_change_tag=record_change_tag(record), record_modification_date=record_field_value(record, "recordModificationDate")) so behavior is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pyicloud/common/cloudkit/client.py`:
- Around line 85-87: The debug log is printing full CloudKit URLs (which may
contain dsid query params or signed asset URLs); update the CloudKit client
logging in the post method (def post(...)) and the corresponding get method to
avoid leaking sensitive query parameters by logging only the endpoint path or a
redacted placeholder instead of the full URL (e.g., use the input path or strip
query string/signed tokens before logging); ensure LOGGER.debug calls no longer
reference the full URL built by build_url but a safe representation.
In `@pyicloud/services/photos_cloudkit/materialize.py`:
- Around line 104-107: The code currently calls taken_at.astimezone() before
formatting for EXIF which shifts the asset's wall-clock time; remove the
.astimezone() call and pass taken_at.strftime("%Y:%m:%d %H:%M:%S") directly to
_insert_exif_datetime_segment (i.e., change the timestamp argument in the call
that assigns updated) so the original timezone-aware timestamp returned by
_asset_datetime() is preserved.
In `@pyicloud/services/photos_legacy.py`:
- Around line 1228-1259: The legacy PhotoAlbum._get_len implementation assumes
the batch response is fully populated and directly indexes
response["batch"][0]["records"][0"]["fields"]["itemCount"]["value"], which
raises IndexError/KeyError on empty or error-shaped payloads and aborts album
iteration; update _get_len to mirror the hardened logic used in
photos_cloudkit/service.py::PhotoAlbum._get_len by validating the response
structure (check "batch", non-empty records, "fields" and "itemCount"/"value")
and catching IndexError/KeyError/TypeError/ValueError around the extraction, log
a clear message via the same logger used elsewhere, and return 0 on
missing/error cases so callers (e.g. photos generator using len(self) - 1)
continue gracefully; reference symbols: _get_len, self.service.session.post,
self._get_container_id, self._zone_id.
- Around line 1037-1041: The current PhotoAlbum.fullname implementation can
raise KeyError or recurse infinitely via malformed/cyclic parent links; change
fullname to perform an iterative upward walk using a visited set to detect
cycles and stop if a parent id is missing instead of indexing AlbumContainer
directly (avoid relying on AlbumContainer.__getitem__ to always succeed). Mirror
the hardened approach used in PhotoAlbum.fullname in photos_cloudkit/service.py:
start from self, loop up reading parent ids from self._parent_id and retrieving
parents from self._library.albums only if present, break and mark missing/stop
if a parent lookup fails, and return a joined path built from the collected
names; ensure cycles are detected by tracking seen ids to prevent infinite
loops.
---
Duplicate comments:
In `@pyicloud/services/photos_cloudkit/client.py`:
- Around line 191-193: The PhotosUploadResponse.model_validate call in the
upload path can raise a pydantic ValidationError which currently leaks out; wrap
the call inside a try/except that catches pydantic.errors.ValidationError (or
pydantic.ValidationError) and re-raise a CloudKitApiError with a clear message
and include original error/context (use the same context string used in
_response_json like "Photos upload"); update the upload handler where
PhotosUploadResponse, self._response_json and variable data are used to ensure
any validation failures are converted to CloudKitApiError consistent with how
batch_count errors are normalized.
---
Nitpick comments:
In `@pyicloud/services/photos_cloudkit/service.py`:
- Around line 633-669: The two branches returning cls(...) are identical and the
conditional if cls is PhotoAlbum is dead; remove the duplication by deleting the
conditional branch and collapse to a single return cls(...) after the cls
assignment (where cls may be set to PhotoAlbumFolder when album_type is
AlbumTypeEnum.FOLDER), preserving all existing keyword arguments (library, name,
record_id via record_name(record), obj_type, list_type, direction, query_filter,
client=self._client, zone_id=self.zone_id, query_filters=typed_query_filters,
parent_id=cast(Optional[str], record_field_value(record, "parentId")),
record_change_tag=record_change_tag(record),
record_modification_date=record_field_value(record, "recordModificationDate"))
so behavior is unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5ed583f2-7dbb-491d-aac7-fed55f835f3b
📒 Files selected for processing (14)
examples.pypyicloud/cli/commands/photos.pypyicloud/common/cloudkit/client.pypyicloud/services/photos_cloudkit/client.pypyicloud/services/photos_cloudkit/mappers.pypyicloud/services/photos_cloudkit/materialize.pypyicloud/services/photos_cloudkit/service.pypyicloud/services/photos_cloudkit/sync.pypyicloud/services/photos_legacy.pytests/fixtures/photos_browser_mutations/README.mdtests/services/test_photos.pytests/services/test_photos_cloudkit_client.pytests/services/test_photos_sync.pytests/test_cmdline.py
✅ Files skipped from review due to trivial changes (2)
- tests/fixtures/photos_browser_mutations/README.md
- pyicloud/services/photos_cloudkit/mappers.py
🚧 Files skipped from review as they are similar to previous changes (1)
- pyicloud/services/photos_cloudkit/sync.py
|
@MrJarnould - thanks again for your support and updates. Are you able to resolve the merge conflicts and I'll get this merged. Thanks again. |
Breaking change
None.
Proposed change
This PR hardens Photos CloudKit sync path handling so remote photo metadata and persisted sync state cannot cause local filesystem writes or deletes outside the configured sync directory.
The sync engine now sanitizes remote resource filenames, normalizes folder-structure output into safe relative path components, and resolves final target paths through a containment check before writing downloaded files, applying local EXIF/XMP metadata updates, or deleting stale local files during
--auto-delete.This addresses a SonarCloud path-injection finding around EXIF metadata writes by fixing the path construction boundary rather than only changing the highlighted write call.
Type of change
Example of code:
Additional information
pythonsecurity:S2083path-injection finding in Photos CloudKit materializationTested locally with:
Results:
Checklist
If user exposed functionality or configuration variables are added/changed: